289 research outputs found
Optimal-Hash Exact String Matching Algorithms
String matching is the problem of finding all the occurrences of a pattern in
a text. We propose improved versions of the fast family of string matching
algorithms based on hashing -grams. The improvement consists of considering
minimal values such that each -grams of the pattern has a unique hash
value. The new algorithms are fastest than algorithm of the HASH family for
short patterns on large size alphabets.Comment: 14 page
A fast implementation of the Boyer-Moore string matching algorithm
Manuscript, http://www-igm.univ-mlv.fr/~lecroq/articles/cl2008.pd
Efficient Pattern Matching on Binary Strings
The binary string matching problem consists in finding all the occurrences of
a pattern in a text where both strings are built on a binary alphabet. This is
an interesting problem in computer science, since binary data are omnipresent
in telecom and computer network applications. Moreover the problem finds
applications also in the field of image processing and in pattern matching on
compressed texts. Recently it has been shown that adaptations of classical
exact string matching algorithms are not very efficient on binary data. In this
paper we present two efficient algorithms for the problem adapted to completely
avoid any reference to bits allowing to process pattern and text byte by byte.
Experimental results show that the new algorithms outperform existing solutions
in most cases.Comment: 12 page
Algorithms for Computing Abelian Periods of Words
Constantinescu and Ilie (Bulletin EATCS 89, 167--170, 2006) introduced the
notion of an \emph{Abelian period} of a word. A word of length over an
alphabet of size can have distinct Abelian periods.
The Brute-Force algorithm computes all the Abelian periods of a word in time
using space. We present an off-line
algorithm based on a \sel function having the same worst-case theoretical
complexity as the Brute-Force one, but outperforming it in practice. We then
present on-line algorithms that also enable to compute all the Abelian periods
of all the prefixes of .Comment: Accepted for publication in Discrete Applied Mathematic
Fast Computation of Abelian Runs
Given a word and a Parikh vector , an abelian run of period
in is a maximal occurrence of a substring of having
abelian period . Our main result is an online algorithm that,
given a word of length over an alphabet of cardinality and a
Parikh vector , returns all the abelian runs of period
in in time and space , where is the
norm of , i.e., the sum of its components. We also present an
online algorithm that computes all the abelian runs with periods of norm in
in time , for any given norm . Finally, we give an -time
offline randomized algorithm for computing all the abelian runs of . Its
deterministic counterpart runs in time.Comment: To appear in Theoretical Computer Scienc
A Note on Easy and Efficient Computation of Full Abelian Periods of a Word
Constantinescu and Ilie (Bulletin of the EATCS 89, 167-170, 2006) introduced
the idea of an Abelian period with head and tail of a finite word. An Abelian
period is called full if both the head and the tail are empty. We present a
simple and easy-to-implement -time algorithm for computing all
the full Abelian periods of a word of length over a constant-size alphabet.
Experiments show that our algorithm significantly outperforms the
algorithm proposed by Kociumaka et al. (Proc. of STACS, 245-256, 2013) for the
same problem.Comment: Accepted for publication in Discrete Applied Mathematic
Efficient validation and construction of border arrays
In this article we present an on-line linear time and space algorithm to check if an integer array f is the border array of at least one string w built on a bounded or unbounded size alphabet Σ. We first show some relations between the border array of some string w and the skeleton of the DFA recognizing Σ ∗ · w, independently of the explicit knowledge of w. This enables us to design algorithms for validating and generating border arrays that outperform existing ones [4, 3]. The validating algorithm lowers the delay (time spent on one element of the array) from O(|w|) to O(min{|Σ|, |w|}) comparing to algorithms in [4, 3]. Finally we give some results on the numbers of distinct border arrays on some alphabet sizes.
- …